Build zipline-ai with Commands "zipline compile" and "zipline run" #161

chewy-zlai · 2024-12-23T20:22:28Z

Summary

Creates console command "zipline" with subcommands "compile" and "run". This change refactors run.py from argparse to click so both can be commands for a click group. This change also updates the pypi package name to zipline-ai.

Checklist

Added Unit Tests
[ x] Covered by existing CI
[ x] Integration tested
Documentation update

Summary by CodeRabbit

Release Notes

New Features
- Introduced a new command-line interface (CLI) using the click library.
- Added zipline command group with extract_and_convert and run_main commands.
Improvements
- Enhanced command-line argument parsing and handling.
- Updated package metadata and versioning.
- Expanded supported Spark versions.
Package Changes
- Renamed package from "chronon-ai" to "zipline-ai".
- Updated package description and version.
Breaking Changes
- Modified command-line argument structure.
- Replaced argparse with click library for argument management.
Dependency Updates
- Added multiple new dependencies, including google-cloud-storage.
- Removed several dependencies related to Google Cloud services from development requirements.

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1209001371150237

…' and 'run'

coderabbitai · 2024-12-23T20:22:36Z

Walkthrough

The pull request introduces a comprehensive refactoring of command-line argument handling across multiple Python files, transitioning from argparse to the click library. The changes standardize the CLI structure, update package metadata, and create a new zipline command group that consolidates various repository-related commands. The modifications enhance script flexibility, improve parameter management, and prepare the package for more streamlined command-line interactions.

Changes

File	Change Summary
`api/py/ai/chronon/repo/run.py`	Replaced `argparse` with `click`, updated function signatures, added `main()` function, expanded Spark version support
`api/py/setup.py`	Updated package name to `zipline-ai`, changed version, replaced `scripts` with `entry_points`
`api/py/ai/chronon/repo/compile.py`	Added explicit command name 'compile' to `@click.command()`
`api/py/ai/chronon/repo/zipline.py`	Created new CLI command group with `extract_and_convert` and `run_main` commands
`api/py/test/test_run.py`	Refactored tests to use `click` context instead of `argparse`, removed `parser` fixture
`api/py/requirements/base.in`	Added new dependency: `google-cloud-storage==2.19.0`
`api/py/requirements/base.txt`	Added multiple new dependencies related to Google Cloud and others
`api/py/requirements/dev.in`	Removed dependency: `google-cloud-storage==2.19.0`
`api/py/requirements/dev.txt`	Removed several dependencies related to Google Cloud services

Possibly Related PRs

Connect run.py to DataprocSubmitter.scala so that offline jobs can be run on Dataproc #186: This PR modifies run.py to integrate with DataprocSubmitter.scala, adding functionality for submitting jobs to Google Cloud Dataproc, which is directly related to the changes made in the main PR that also involves modifications to run.py.

Suggested Reviewers

nikhil-zlai
piyush-zlai
tchow-zlai

Poem

🚀 Zipline soars on CLI wings so bright,
Click commands dance with newfound might,
From argparse to a sleeker design,
Code flows smooth, a digital shrine!
Refactoring magic takes its flight 🌈

Warning

Review ran into problems

🔥 Problems

GitHub Actions: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

Finishing Touches

📝 Generate Docstrings (Beta)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

Dockerfile (3)
79-79: Use COPY instead of ADD to follow best practices.
-ADD api/py /srv/chronon
+COPY api/py /srv/chronon
🧰 Tools

🪛 Hadolint (2.12.0)

[error] 79-79: Use COPY instead of ADD for files and folders

(DL3020)

80-81: Installing requirements early can benefit layer caching.

82-83: COPY for individual scripts is fine; consider consolidating copies.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4755a5d and f390818.

📒 Files selected for processing (6)

Dockerfile (1 hunks)
api/py/ai/chronon/repo/explore.py (2 hunks)
api/py/ai/chronon/repo/run.py (2 hunks)
api/py/setup.py (2 hunks)
docker-compose.yml (1 hunks)
quickstart/requirements.txt (1 hunks)

✅ Files skipped from review due to trivial changes (1)

quickstart/requirements.txt

🧰 Additional context used

🪛 Hadolint (2.12.0)

Dockerfile

[error] 79-79: Use COPY instead of ADD for files and folders

(DL3020)

🔇 Additional comments (9)

api/py/setup.py (3)

29-29: Version update is valid and aligns with semantic versioning.

55-61: Console scripts are an improvement over scripts= for modular CLI.

65-65: Renaming package to 'zipline-ai' is consistent with PR objectives.

api/py/ai/chronon/repo/run.py (2)

626-626: New main() function improves clarity and testing feasibility.

725-726: Entry point call is correct for script execution.

docker-compose.yml (1)

37-38: Switch to 'build' instead of pre-built image is good for local iteration.

Dockerfile (1)

85-85: Environment variables set the runtime path cleanly.

api/py/ai/chronon/repo/explore.py (2)

Line range hint 386-416: Encapsulating logic in a main() is nice.
This structuring makes the script more testable and clearer.

417-418: Good use of main() for script execution.
This avoids cluttering the global namespace. Verify that CLI args are parsed as intended.

✅ Verification successful

Let me check the main function implementation and CLI argument parsing.

CLI args are properly parsed and handled.
The main function correctly parses required and optional arguments using argparse:

Required: keyword for lookup
Optional: --conf-root with default value
Optional: --handler-args for special handlers

🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found

Script:

#!/bin/bash
# Find the main function implementation
ast-grep --pattern 'def main():
  $$$'

Length of output: 13201

Script:

#!/bin/bash
# Look for argument parsing
rg "ArgumentParser|argparse|sys.argv" -A 3

Length of output: 2673

Script:

#!/bin/bash
# Get the full content of explore.py to understand the context
cat api/py/ai/chronon/repo/explore.py

Length of output: 14140

api/py/setup.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

api/py/ai/chronon/repo/explore.py (2)

Line range hint 387-417: LGTM! Consider adding error handling.

Clean refactoring to improve modularity. Consider wrapping the main execution in a try-except block to handle potential exceptions gracefully.

 def main():
+    try:
         parser = argparse.ArgumentParser(description="Explore tool for chronon")
         parser.add_argument("keyword", help="Keyword to look up keys")
         parser.add_argument("--conf-root", help="Conf root for the configs", default=CWD)
         parser.add_argument(
             "--handler-args", nargs="*", help="Special arguments for handler keywords of the form param=value")
         args = parser.parse_args()
         root = args.conf_root
         if not (root.endswith("chronon") or root.endswith("zipline")):
             print("This script needs to be run from chronon conf root - with folder named 'chronon' or 'zipline', found: "
                   + root)
         teams = load_team_data(os.path.join(root, 'teams.json'))
         gb_index = build_index("group_bys", GB_INDEX_SPEC, root=root, teams=teams)
         join_index = build_index("joins", JOIN_INDEX_SPEC, root=root, teams=teams)
         enrich_with_joins(gb_index, join_index, root=root, teams=teams)

         candidate = args.keyword
         if candidate in handlers:
             print(f"{candidate} is a registered handler")
             handler = handlers[candidate]
             handler_args = {}
             for arg in args.handler_args:
                 splits = arg.split("=", 1)
                 assert len(splits) == 2, f"need args to handler for the form, param=value. Found and invalid arg:{arg}"
                 key, value = splits
                 handler_args[key] = value
             handler(**handler_args)
         else:
             group_bys = find_in_index(gb_index, args.keyword)
             display_entries(group_bys, args.keyword, root=root, trim_paths=True)
+    except Exception as e:
+        print(f"Error: {e}")
+        return 1
+    return 0

419-420: LGTM! Consider using the return code.

Clean entry point. Consider using the return code from main().

 if __name__ == "__main__":
-    main()
+    exit(main())

api/py/ai/chronon/repo/run.py (1)

295-298: Adopt ternary for brevity.

Use the suggested ternary operator for simpler logic:
-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools

🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between ce2f019 and 01ac2aa.

📒 Files selected for processing (5)

api/py/ai/chronon/repo/compile.py (1 hunks)
api/py/ai/chronon/repo/explore.py (2 hunks)
api/py/ai/chronon/repo/run.py (10 hunks)
api/py/ai/chronon/repo/zipline.py (1 hunks)
api/py/setup.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

🔇 Additional comments (9)

api/py/ai/chronon/repo/zipline.py (3)

1-2: Looks good.

Imports are straightforward and aligned with the new CLI usage.

7-9: Neat CLI group definition.

A clear group name "zipline" neatly organizes subcommands.

12-13: Commands bound successfully.

Subcommands integrate well via add_command.

api/py/ai/chronon/repo/compile.py (1)

51-51: Command name changed to 'compile'.

This improves the CLI usage consistency and readability.

api/py/setup.py (3)

55-59: Entry point is well-defined.

Centralized CLI entry point with zipline is a good move for discoverability.

60-60: Updated package description.

Switched to "Zipline python API library" aligns with rebranding.

63-63: Renamed package to "zipline-ai".

This completes the naming alignment with the new brand.

api/py/ai/chronon/repo/run.py (2)

627-669: CLI restructured with Click.

Main function and options are well-organized and easy to extend.

679-685: Jar download logic.

The fallback to download_jar is clear, ensuring correctness for embedded vs. uber jars.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

api/py/ai/chronon/repo/run.py (4)

295-298: Simplify context assignment using ternary operator

-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"

🧰 Tools

🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

383-391: Extract online jar fetching logic

Move the online jar fetching logic into a separate method for better readability and maintainability.

 def __init__(self, args, jar_path):
     self.online_jar = args["online_jar"]
-    valid_jar = args["online_jar"] and os.path.exists(args["online_jar"])
-    if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar:
-        print("Downloading online_jar")
-        self.online_jar = check_output("{}".format(args["online_jar_fetch"])).decode(
-            "utf-8"
-        )
-        os.environ["CHRONON_ONLINE_JAR"] = self.online_jar
-        print("Downloaded jar to {}".format(self.online_jar))
+    self._fetch_online_jar_if_needed(args)

+def _fetch_online_jar_if_needed(self, args):
+    valid_jar = args["online_jar"] and os.path.exists(args["online_jar"])
+    if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar:
+        print("Downloading online_jar")
+        self.online_jar = check_output("{}".format(args["online_jar_fetch"])).decode("utf-8")
+        os.environ["CHRONON_ONLINE_JAR"] = self.online_jar
+        print("Downloaded jar to {}".format(self.online_jar))

601-624: Consider centralizing default values

Default values are currently set in both set_defaults and click options. Consider moving all defaults to a central configuration.

Also applies to: 626-665

283-290: Enhance error message for invalid conf path

Add more context to the error message by including the expected path format.

-    logging.error(
-        "Invalid conf path: {}, please ensure to supply the relative path to zipline/ folder".format(
-            params["conf"]
-        )
-    )
+    logging.error(
+        "Invalid conf path: {}. Expected format: <context>/<conf_type>/<team>/<name> relative to zipline/ folder".format(
+            params["conf"]
+        )
+    )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 01ac2aa and f35985d.

📒 Files selected for processing (1)

api/py/ai/chronon/repo/run.py (8 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

api/py/ai/chronon/repo/run.py (6)

196-200: Add version format validation

Add input validation for the version parameter to ensure it matches the expected format (e.g., X.Y.Z).

 def download_jar(
         version,
         jar_type="uber",
         release_tag=None,
         spark_version="2.4.0",
         skip_download=False,
+):
+    if version and version != "latest":
+        assert re.match(r'^\d+\.\d+\.\d+$', version), f"Invalid version format: {version}. Expected format: X.Y.Z"
 ):

295-298: Simplify context assignment

Use a ternary operator for better readability.

-                if params["env"]:
-                    context = params["env"]
-                else:
-                    context = "dev"
+                context = params["env"] if params["env"] else "dev"

🧰 Tools

🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

384-390: Improve online jar validation error handling

Add explicit error message when online jar validation fails.

         valid_jar = args["online_jar"] and os.path.exists(args["online_jar"])
         # fetch online jar if necessary
-        if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar:
+        if self.mode in ONLINE_MODES and not args["sub_help"]:
+            if not valid_jar and not args["online_jar_fetch"]:
+                raise ValueError(f"Online jar not found at {args['online_jar']} and no fetch command provided")
             print("Downloading online_jar")

579-580: Add date format validation

Add explicit date format validation before processing.

+    def validate_date_format(date_str):
+        try:
+            return datetime.strptime(date_str, "%Y-%m-%d")
+        except ValueError:
+            raise ValueError(f"Invalid date format: {date_str}. Expected format: YYYY-MM-DD")
+
+    start_date = validate_date_format(start_date)
+    end_date = validate_date_format(end_date)
-    start_date = datetime.strptime(start_date, "%Y-%m-%d")
-    end_date = datetime.strptime(end_date, "%Y-%m-%d")

627-668: Improve help messages clarity

The help messages for some options could be more descriptive.

-@click.option("--ds", help="the end partition to backfill the data", default=datetime.today().strftime("%Y-%m-%d"))
+@click.option("--ds", help="End partition date for data backfill (format: YYYY-MM-DD)", 
+              default=datetime.today().strftime("%Y-%m-%d"))

679-688: Move jar path expansion earlier

Move the path expansion before the jar download to avoid potential issues.

     jar_path = (
-        chronon_jar
+        os.path.expanduser(chronon_jar)
         if chronon_jar
         else download_jar(
             version,
             jar_type=jar_type,
             release_tag=release_tag,
             spark_version=os.environ.get("SPARK_VERSION", spark_version),
         )
     )
-    Runner(ctx.params, os.path.expanduser(jar_path)).run()
+    Runner(ctx.params, jar_path).run()

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between f35985d and fef3186.

📒 Files selected for processing (1)

api/py/ai/chronon/repo/run.py (10 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (2)

api/py/ai/chronon/repo/run.py (2)
Line range hint 244-273: Improve error handling in environment setup.

The environment setup lacks proper error handling for missing or invalid environment variables.

Add try-catch blocks and validation:
 def set_runtime_env(params):
+    if not isinstance(params, dict):
+        raise ValueError("params must be a dictionary")
     effective_mode = params["mode"]
+    if effective_mode is None:
+        raise ValueError("mode parameter is required")
🧰 Tools

🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

Line range hint 488-497: Add retry mechanism for streaming app verification.

The streaming app verification lacks retry logic, which could fail due to temporary network issues.
+@retry_decorator(retries=3, backoff=20)
+def verify_running_apps(app_name, list_apps_cmd):
+    running_apps = check_output("{}".format(list_apps_cmd)).decode("utf-8").split("\n")
+    return [json.loads(app.strip()) for app in running_apps if app.strip()]

 if len(filtered_apps) > 0:
     if self.mode == "streaming":
         assert len(filtered_apps) == 1, "More than one found, please kill them all"

🧹 Nitpick comments (3)

api/py/ai/chronon/repo/run.py (3)

295-298: Simplify context assignment using ternary operator.

-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"

🧰 Tools

🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

627-688: Enhance CLI documentation and type validation.

The CLI options lack type validation and comprehensive documentation.

 @click.option("--parallelism",
+              type=click.IntRange(min=1),
               help="break down the backfill range into this number of tasks in parallel. "
                    "Please use it along with --start-ds and --end-ds and only in manual mode")
 @click.option("--repo",
+              type=click.Path(exists=True, file_okay=False, dir_okay=True),
               help="Path to chronon repo",
               default=os.environ.get("CHRONON_REPO_PATH", "."))

Line range hint 579-597: Optimize date range splitting for large ranges.

The date range splitting function could be optimized for memory efficiency with large date ranges.

 def split_date_range(start_date, end_date, parallelism):
+    if not isinstance(parallelism, int) or parallelism < 1:
+        raise ValueError("Parallelism must be a positive integer")
     start_date = datetime.strptime(start_date, "%Y-%m-%d")
     end_date = datetime.strptime(end_date, "%Y-%m-%d")
-    date_ranges = []
+    for i in range(parallelism):
+        split_start = start_date + timedelta(days=i * split_size)
+        split_end = end_date if i == parallelism - 1 else split_start + timedelta(days=split_size - 1)
+        yield (split_start.strftime("%Y-%m-%d"), split_end.strftime("%Y-%m-%d"))

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between fef3186 and ba5bcc9.

📒 Files selected for processing (1)

api/py/ai/chronon/repo/run.py (11 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

coderabbitai · 2025-01-03T00:59:57Z

api/py/ai/chronon/repo/run.py

+        self.repo = args["repo"]
+        self.conf = args["conf"]
+        self.sub_help = args["sub_help"]
+        self.mode = args["mode"]
+        self.online_jar = args["online_jar"]
+        valid_jar = args["online_jar"] and os.path.exists(args["online_jar"])


⚠️ Potential issue

Validate input parameters in Runner initialization.

The Runner class accepts parameters without validation.

Add parameter validation:

def __init__(self, args, jar_path): + if not isinstance(args, dict): + raise ValueError("args must be a dictionary") + if not jar_path or not isinstance(jar_path, str): + raise ValueError("jar_path must be a non-empty string") self.repo = args["repo"]

Committable suggestion skipped: line range outside the PR's diff.

…wys/setup_python_build

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

api/py/ai/chronon/repo/run.py (2)
295-298: Simplify context assignment using ternary operator.
-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools

🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

673-675: Add docstring to main function.

Add documentation describing the function's purpose and parameters.
 def main(ctx, conf, env, mode, ds, app_name, start_ds, end_ds, parallelism, repo, online_jar, online_class, version,
          spark_version, spark_submit_path, spark_streaming_submit_path, online_jar_fetch, sub_help, conf_type,
          online_args, chronon_jar, release_tag, list_apps, render_info):
+    """Execute the Chronon pipeline with the specified configuration.
+    
+    Args:
+        ctx: Click context object
+        conf: Configuration path
+        env: Environment (dev/prod)
+        [...]
+    """

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between ba5bcc9 and 80384d9.

📒 Files selected for processing (2)

api/py/ai/chronon/repo/compile.py (1 hunks)
api/py/ai/chronon/repo/run.py (11 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

api/py/ai/chronon/repo/compile.py

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

🔇 Additional comments (2)

api/py/ai/chronon/repo/run.py (2)

583-584: LGTM: Correct handling of inclusive date ranges.

383-388: 🛠️ Refactor suggestion

Add parameter validation in Runner initialization.

Add checks for required parameters to prevent runtime errors.
 def __init__(self, args, jar_path):
+    if not isinstance(args, dict):
+        raise ValueError("args must be a dictionary")
+    if not jar_path:
+        raise ValueError("jar_path is required")
     self.repo = args["repo"]
     self.conf = args["conf"]
Likely invalid or redundant comment.

…ssing

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (6)

api/py/ai/chronon/repo/run.py (2)
295-296: Simplify context assignment using ternary operator.
-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools

🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

605-632: Consider environment variable prefix constants.

Define a constant for the environment variable prefix "CHRONON_" to avoid string duplication.
+ENV_PREFIX = "CHRONON_"
 def set_defaults(ctx):
     """Set default values based on environment"""
-    chronon_repo_path = os.environ.get("CHRONON_REPO_PATH", ".")
+    chronon_repo_path = os.environ.get(f"{ENV_PREFIX}REPO_PATH", ".")
api/py/test/test_run.py (4)

31-47: Consider using a pytest fixture.
This helper could be converted into a fixture for more flexible reuse.

83-128: Reduce environment reset duplication.
Repeated code for resetting environment variables can be placed in a shared setup or fixture.

Also applies to: 144-149, 159-164, 179-183, 188-192

169-175: Use a more specific exception.
Catching Exception is too broad. Consider a narrower exception class for clarity.

🧰 Tools

🪛 Ruff (0.8.2)

169-169: pytest.raises(Exception) should be considered evil

(B017)

Line range hint 271-328: Consider unifying repeated steps.
Streaming and streaming-client tests share logic. A helper could reduce duplication.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 80384d9 and 7679303.

📒 Files selected for processing (2)

api/py/ai/chronon/repo/run.py (11 hunks)
api/py/test/test_run.py (8 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

api/py/test/test_run.py

169-169: pytest.raises(Exception) should be considered evil

(B017)

🔇 Additional comments (9)

api/py/ai/chronon/repo/run.py (5)

196-200: LGTM: Download jar function signature and validation.

The function correctly validates spark versions against supported versions.

Also applies to: 203-203

383-388: Add input validation in Runner initialization.

The past review comment about validating input parameters in Runner initialization is still valid.

Also applies to: 390-392

410-412: LGTM: Mode validation logic.

The code properly validates modes against possible modes for the configuration type.

Also applies to: 415-416

583-584: LGTM: Date range calculation.

The date range calculation correctly includes the end date.

634-701: LGTM: Click command setup.

The command-line interface is well-structured with appropriate options and defaults.

api/py/test/test_run.py (4)

19-19: Good import for Click usage.

197-210: Looks good.
No issues spotted with property defaults logic.

213-241: Clean parameter management.
Render info testing appears consistent with the new approach.

Line range hint 244-264: No concerns.
Tests properly validate rendering script usage.

coderabbitai · 2025-01-05T20:15:34Z

api/py/ai/chronon/repo/run.py

+    effective_mode = params["mode"]
    if effective_mode and "streaming" in effective_mode:
        effective_mode = "streaming"
-    if args.repo:
-        teams_file = os.path.join(args.repo, "teams.json")
+    if params["repo"]:
+        teams_file = os.path.join(params["repo"], "teams.json")


⚠️ Potential issue

Add error handling for missing repo path.

The code assumes repo path exists without validation.

if params["repo"]: + if not os.path.exists(params["repo"]): + raise ValueError(f"Repository path does not exist: {params['repo']}") teams_file = os.path.join(params["repo"], "teams.json")

Also applies to: 281-287

coderabbitai · 2025-01-05T20:15:34Z

api/py/ai/chronon/repo/run.py

+        self.args = args["args"] if args["args"] else ""
+        self.online_class = args["online_class"]
+        self.app_name = args["app_name"]


⚠️ Potential issue

Validate paths before assignment.

File paths should be validated before assignment.

-self.spark_submit = args["spark_submit_path"] +if not os.path.exists(args["spark_submit_path"]): + raise ValueError(f"Spark submit path does not exist: {args['spark_submit_path']}") +self.spark_submit = args["spark_submit_path"]

Also applies to: 430-430, 433-435, 437-438

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

api/py/ai/chronon/repo/run.py (3)
196-200: Add input validation for spark_version.

The assertion message is good, but add upfront validation.
 def download_jar(
         version,
         jar_type="uber",
         release_tag=None,
         spark_version="2.4.0",
         skip_download=False,
 ):
+    if not isinstance(spark_version, str):
+        raise ValueError(f"spark_version must be a string, got {type(spark_version)}")
     assert (
         spark_version in SUPPORTED_SPARK
     ), f"Received unsupported spark version {spark_version}. Supported spark versions are {SUPPORTED_SPARK}"
Also applies to: 203-203

295-296: Use ternary operator for context assignment.
-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools

🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

583-584: Use datetime.timedelta directly for date difference.

Simplify the date range calculation.
-    total_days = (
-                         end_date - start_date
-                 ).days + 1
+    total_days = (end_date - start_date).days + 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 7679303 and 1d52e11.

📒 Files selected for processing (1)

api/py/ai/chronon/repo/run.py (11 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

🔇 Additional comments (5)

api/py/ai/chronon/repo/run.py (5)

270-274: Add path validation for repo and conf.

The code assumes paths exist without proper validation.

Also applies to: 281-287

383-388: Add input validation for Runner initialization.

Also applies to: 390-392

415-416: Validate paths before assignment.

Also applies to: 418-418, 421-422, 426-428, 430-430, 433-435, 437-438

605-632: LGTM! Good default configuration setup.

The function properly handles environment variables and default paths.

634-696: Well-structured CLI using Click.

Good job on:

Comprehensive options with clear help messages

Environment variable fallbacks

Flexible argument handling with allow_extra_args

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

api/py/ai/chronon/repo/run.py (1)
Line range hint 307-311: Add path traversal protection.

Validate conf_path to prevent directory traversal attacks.
 conf_path = os.path.join(params["repo"], params["conf"])
+conf_path = os.path.abspath(conf_path)
+if not conf_path.startswith(os.path.abspath(params["repo"])):
+    raise ValueError("Invalid conf path: Attempted path traversal")
 if os.path.isfile(conf_path):
🧰 Tools

🪛 Ruff (0.8.2)

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

🧹 Nitpick comments (2)

api/py/ai/chronon/repo/run.py (2)
300-303: Simplify context assignment.

Use a ternary operator for better readability.
-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools

🪛 Ruff (0.8.2)

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

793-827: Add docstring and type hints to main function.

The main function would benefit from:

A comprehensive docstring explaining the command's purpose

Type hints for all parameters
@click.command(name="run", context_settings=dict(allow_extra_args=True, ignore_unknown_options=True))
+def main(
+    ctx: click.Context,
+    conf: Optional[str],
+    env: str,
+    # ... other parameters with type hints
+) -> None:
+    """Execute Chronon jobs with specified configuration and parameters.
+    
+    Args:
+        ctx: Click context for handling command state
+        conf: Configuration path relative to repo
+        env: Environment (dev/prod)
+        # ... document other parameters
+    """

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 9e015ee and a2cb521.

📒 Files selected for processing (1)

api/py/ai/chronon/repo/run.py (12 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/run.py

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: no_spark_scala_tests

🔇 Additional comments (2)

api/py/ai/chronon/repo/run.py (2)

388-393: Add parameter validation in Runner initialization.

Critical parameters should be validated to prevent runtime errors.

64-65: Verify Spark 3.5.1 compatibility.

The addition of Spark 3.5.1 support needs validation.

✅ Verification successful

Spark 3.5.1 support properly configured

Build files correctly define and use Spark 3.5.1

Version validation and Scala mappings are consistent
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check if Spark 3.5.1 is properly supported across the codebase
rg -g '*.{scala,java}' 'spark.*3\.5\.1'
Length of output: 41

Script:
#!/bin/bash
# Check build files and Python code for Spark version references
echo "=== Build Files ==="
fd -e xml -e sbt . | xargs rg "3\.5\.1|spark"

echo -e "\n=== Python Files ==="
rg -g "*.py" "spark.*version|SUPPORTED_SPARK"

echo -e "\n=== Test Files ==="
fd -e py "test_" | xargs rg "spark.*version|3\.5\.1"
Length of output: 5601

Script:
#!/bin/bash
# Check for version-specific implementations
rg -g "*.{scala,py}" "(?:if|match).*(?:spark|version).*3\." -A 3
Length of output: 66

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

api/py/ai/chronon/repo/run.py (1)

388-393: ⚠️ Potential issue

Validate input parameters in Runner initialization.

Add parameter validation for args and jar_path.

🧹 Nitpick comments (1)

api/py/ai/chronon/repo/run.py (1)
300-303: Use ternary operator for conciseness.
-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools

🪛 Ruff (0.8.2)

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between a2cb521 and cc331be.

📒 Files selected for processing (1)

api/py/ai/chronon/repo/run.py (12 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/run.py

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: no_spark_scala_tests

🔇 Additional comments (3)

api/py/ai/chronon/repo/run.py (3)

64-65: LGTM: Spark 3.5.1 support added correctly.

794-846: LGTM: Well-structured click command setup.

Command options are comprehensive and well-documented.

275-279: ⚠️ Potential issue

Add path validation.

Validate repository path before accessing files.
 if params["repo"]:
+    if not os.path.exists(params["repo"]):
+        raise ValueError(f"Repository path does not exist: {params['repo']}")
     teams_file = os.path.join(params["repo"], "teams.json")
Likely invalid or redundant comment.

coderabbitai · 2025-01-10T00:46:00Z

api/py/ai/chronon/repo/run.py

+        if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar:
            print("Downloading online_jar")
-            self.online_jar = check_output("{}".format(args.online_jar_fetch)).decode(
+            self.online_jar = check_output("{}".format(args["online_jar_fetch"])).decode(
                "utf-8"
            )


⚠️ Potential issue

Add error handling for jar download.

Add try-except block to handle download failures gracefully.

if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar: print("Downloading online_jar") + try: self.online_jar = check_output("{}".format(args["online_jar_fetch"])).decode( "utf-8" ) + except subprocess.CalledProcessError as e: + raise RuntimeError(f"Failed to download online jar: {e}") os.environ["CHRONON_ONLINE_JAR"] = self.online_jar

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

api/py/ai/chronon/repo/run.py (2)

288-294: Improve error message specificity.

The error message could be more helpful by suggesting the correct path format.

-                    logging.error(
-                        "Invalid conf path: {}, please ensure to supply the relative path to zipline/ folder".format(
-                            params["conf"]
-                        )
-                    )
+                    logging.error(
+                        "Invalid conf path: {}. Expected format: <context>/<conf_type>/<team>/<name>".format(
+                            params["conf"]
+                        )
+                    )

300-303: Simplify context assignment.

Use a ternary operator for cleaner code.

-                if params["env"]:
-                    context = params["env"]
-                else:
-                    context = "dev"
+                context = params["env"] if params["env"] else "dev"

🧰 Tools

🪛 Ruff (0.8.2)

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between cc331be and 8e23f46.

📒 Files selected for processing (1)

api/py/ai/chronon/repo/run.py (12 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/run.py

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: no_spark_scala_tests

🔇 Additional comments (5)

api/py/ai/chronon/repo/run.py (5)

21-21: LGTM! Good choice of CLI framework.

The transition to Click is a good choice as it provides better CLI structuring capabilities.

388-393: Add input validation.

Previous review comment about parameter validation remains applicable.

794-828: LGTM! Well-structured CLI implementation.

The click command is well-organized with descriptive help texts and appropriate option types.

431-443: ⚠️ Potential issue

Validate paths before assignment.

Add existence checks for spark_submit paths.
+        def _validate_path(path, name):
+            if not os.path.exists(path):
+                raise ValueError(f"Invalid {name}: {path}")
+
         if self.mode == "streaming":
+            _validate_path(args["spark_streaming_submit_path"], "spark streaming submit path")
             self.spark_submit = args["spark_streaming_submit_path"]
         elif self.mode == "info":
             assert os.path.exists(
                 args["render_info"]
             ), "Invalid path for the render info script: {}".format(args["render_info"])
             self.render_info = args["render_info"]
         else:
+            _validate_path(args["spark_submit_path"], "spark submit path")
             self.spark_submit = args["spark_submit_path"]
Likely invalid or redundant comment.

64-65: Verify Spark 3.5.1 compatibility.

While adding Spark 3.5.1 support is good, ensure all features work with this version.

Run this to check for potential compatibility issues:

✅ Verification successful

Spark 3.5.1 compatibility verified successfully.

The codebase properly handles Spark 3.5.x specific changes and uses stable APIs. Version-specific features like error messages are correctly implemented in tests.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for Spark version-specific code or configurations
rg -A 3 "spark.*3\.[0-9]" 
Length of output: 5105

Script:
#!/bin/bash
# Check for SQL queries and potential deprecated features
rg -g '*.{scala,java,py}' '(?i)spark\.sql|createDataFrame|withColumn|udf|rdd\.' -A 2

# Look for test configurations
fd -e scala -e java test$ -x cat {} 
Length of output: 76729

Script:
#!/bin/bash
# Check for test configurations and Spark version settings
rg -g '*.{scala,java,xml,conf}' 'SparkSession\.builder|spark\.version|spark-version'
Length of output: 361

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

api/py/requirements/base.txt (1)

1-7: Consider pinning pip-compile-multi version

Add pip-compile-multi version to dev requirements for reproducible builds.
api/py/ai/chronon/repo/run.py (2)
300-303: Simplify context assignment using ternary operator.
-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools

🪛 Ruff (0.8.2)

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

826-828: Add type hints to main function parameters.

Improve code maintainability by adding type hints.
-def main(ctx, conf, env, mode, dataproc, ds, app_name, start_ds, end_ds, parallelism, repo, online_jar, online_class,
-         version, spark_version, spark_submit_path, spark_streaming_submit_path, online_jar_fetch, sub_help, conf_type,
-         online_args, chronon_jar, release_tag, list_apps, render_info):
+def main(ctx: click.Context, conf: str, env: str, mode: str, dataproc: bool, ds: str, app_name: str, start_ds: str,
+         end_ds: str, parallelism: int, repo: str, online_jar: str, online_class: str, version: str,
+         spark_version: str, spark_submit_path: str, spark_streaming_submit_path: str, online_jar_fetch: str,
+         sub_help: bool, conf_type: str, online_args: str, chronon_jar: str, release_tag: str,
+         list_apps: str, render_info: str) -> None:

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 8e23f46 and 20bbf1e.

📒 Files selected for processing (2)

api/py/ai/chronon/repo/run.py (12 hunks)
api/py/requirements/base.txt (1 hunks)

🧰 Additional context used

🪛 LanguageTool

api/py/requirements/base.txt

[duplication] ~24-~24: Possible typo: you repeated a word.
Context: ...rage google-cloud-core==2.4.1 # via google-cloud-storage google-cloud-storage==2.19.0 # via -r requirements/base....

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~30-~30: Possible typo: you repeated a word.
Context: ...ia # google-cloud-storage # google-resumable-media google-resumable-media==2.7.2 # via google-cloud-storage g...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~57-~57: Possible typo: you repeated a word.
Context: ...# via google-auth six==1.17.0 # via thrift thrift==0.21.0 # via -r requirements/base....

(ENGLISH_WORD_REPEAT_RULE)

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/run.py

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: no_spark_scala_tests

🔇 Additional comments (5)

api/py/requirements/base.txt (2)

50-54: Check requests package security advisories

Version 2.32.3 is recent but verify security status.

✅ Verification successful

Requests package 2.32.3 is secure
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check for CVEs
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: PIP, package: "requests", severities: [CRITICAL, HIGH]) {
    nodes {
      advisory {
        summary
        severity
      }
      vulnerableVersionRange
    }
  }
}'
Length of output: 426
25-26: Verify google-cloud-storage version compatibility

Version 2.19.0 is relatively recent but check compatibility with your Python version.

✅ Verification successful

Python version compatibility verified

The google-cloud-storage 2.19.0 requirement (Python >=3.7) matches the project's Python version specification.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check minimum Python version requirement
curl -s https://pypi.org/pypi/google-cloud-storage/2.19.0/json | jq '.info.requires_python'
Length of output: 99

Script:
#!/bin/bash
# Find Python version specs
fd -e toml -e py -e cfg -e txt -e yaml -e yml -x grep -l "python" {} \; | xargs -I {} sh -c 'echo "=== {}:"; cat {}'
Length of output: 68129
api/py/ai/chronon/repo/run.py (3)

64-65: LGTM: Spark 3.5.1 support added correctly.

The mapping to Scala 2.12 for Spark 3.5.1 is accurate.

388-393: Validate input parameters in Runner initialization.

The Runner class still accepts parameters without validation.

275-278: ⚠️ Potential issue

Add path validation for repo directory.

Add existence check for the repo directory before accessing files within it.
 effective_mode = params["mode"]
 if effective_mode and "streaming" in effective_mode:
     effective_mode = "streaming"
+if params["repo"] and not os.path.exists(params["repo"]):
+    raise ValueError(f"Repository path does not exist: {params['repo']}")
 if params["repo"]:
Likely invalid or redundant comment.

) ## Summary Creates console command "zipline" with subcommands "compile" and "run". This change refactors run.py from argparse to click so both can be commands for a click group. This change also updates the pypi package name to zipline-ai. ## Checklist - [ ] Added Unit Tests - [ x] Covered by existing CI - [ x] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new command-line interface (CLI) using the `click` library. - Added `zipline` command group with `extract_and_convert` and `run_main` commands. - **Improvements** - Enhanced command-line argument parsing and handling. - Updated package metadata and versioning. - Expanded supported Spark versions. - **Package Changes** - Renamed package from "chronon-ai" to "zipline-ai". - Updated package description and version. - **Breaking Changes** - Modified command-line argument structure. - Replaced `argparse` with `click` library for argument management. - **Dependency Updates** - Added multiple new dependencies, including `google-cloud-storage`. - Removed several dependencies related to Google Cloud services from development requirements.  --- - To see the specific tasks where the Asana app for GitHub is being used, see below: - https://app.asana.com/0/0/1209001371150237 --------- Co-authored-by: tchow <[email protected]>

) ## Summary Creates console command "zipline" with subcommands "compile" and "run". This change refactors run.py from argparse to cliour clients so both can be commands for a cliour clients group. This change also updates the pypi paour clientsage name to zipline-ai. ## Cheour clientslist - [ ] Added Unit Tests - [ x] Covered by existing CI - [ x] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new command-line interface (CLI) using the `cliour clients` library. - Added `zipline` command group with `extract_and_convert` and `run_main` commands. - **Improvements** - Enhanced command-line argument parsing and handling. - Updated paour clientsage metadata and versioning. - Expanded supported Spark versions. - **Paour clientsage Changes** - Renamed paour clientsage from "chronon-ai" to "zipline-ai". - Updated paour clientsage description and version. - **Breaking Changes** - Modified command-line argument structure. - Replaced `argparse` with `cliour clients` library for argument management. - **Dependency Updates** - Added multiple new dependencies, including `google-cloud-storage`. - Removed several dependencies related to Google Cloud services from development requirements.  --- - To see the specific tasks where the Asana app for GitHub is being used, see below: - https://app.asana.com/0/0/1209001371150237 --------- Co-authored-by: tchow <[email protected]>

chewy-zlai added 3 commits December 19, 2024 12:56

Test quickstart using local compile.py

f99047c

comment out chronon-ai requirement

10f3c71

Rename repo to zipline-ai, and rename commands to 'compile', 'explore…

f390818

…' and 'run'

chewy-zlai requested a review from nikhil-zlai December 23, 2024 20:22

coderabbitai bot reviewed Dec 23, 2024

View reviewed changes

chewy-zlai added 5 commits December 23, 2024 12:27

unstage changes to docker files

ce2f019

Revert docker changes, and revert version override

a913efd

Revert version override

bd4ee10

revert change to quickstart/requirements

93322e2

Add blank spaces

0741118

tchow-zlai reviewed Dec 24, 2024

View reviewed changes

api/py/setup.py Outdated Show resolved Hide resolved

tchow-zlai approved these changes Dec 24, 2024

View reviewed changes

chewy-zlai added 3 commits January 2, 2025 16:38

Refactor run.py to use click and setup zipline command

f1ca3d1

format files

01ac2aa

fix multi-line comment

f35985d

coderabbitai bot reviewed Jan 3, 2025

View reviewed changes

chewy-zlai added 2 commits January 2, 2025 16:55

intellij format code

fef3186

fix formatting

ba5bcc9

coderabbitai bot reviewed Jan 3, 2025

View reviewed changes

Merge branch 'main' of https://github.com/zipline-ai/chronon into che…

80384d9

…wys/setup_python_build

coderabbitai bot reviewed Jan 3, 2025

View reviewed changes

chewy-zlai added 2 commits January 5, 2025 12:11

Update test_run to use click instead of argparse and get the tests pa…

7679303

…ssing

fix lint

1d52e11

coderabbitai bot reviewed Jan 5, 2025

View reviewed changes

chewy-zlai changed the title ~~Rename compile.py and Build zipline-ai~~ Build zipline-ai with Commands "zipline compile" and "zipline run" Jan 5, 2025

coderabbitai bot reviewed Jan 5, 2025

View reviewed changes

revert changes to explore.py, leaving it out for now

294a6e9

chewy-zlai added 2 commits January 9, 2025 16:40

Include dataproc flag

a2cb521

Set default for dataproc flag

cc331be

coderabbitai bot reviewed Jan 10, 2025

View reviewed changes

chewy-zlai added 2 commits January 9, 2025 16:52

Add dataproc flag to main

96a2154

lint

8e23f46

coderabbitai bot reviewed Jan 10, 2025

View reviewed changes

chewy-zlai added 2 commits January 9, 2025 16:58

Don't import certifi as it causes snyk to complain

8c47670

update scala_version_for_spark

20bbf1e

chewy-zlai merged commit 588627c into main Jan 10, 2025
4 of 5 checks passed

chewy-zlai deleted the chewys/setup_python_build branch January 10, 2025 01:17

coderabbitai bot reviewed Jan 10, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Jan 12, 2025

Python compile and API skeleton/docs #113

Merged

4 tasks

coderabbitai bot mentioned this pull request Jan 21, 2025

python api: allow string windows #253

Merged

4 tasks

coderabbitai bot mentioned this pull request Feb 1, 2025

feat: upgrade artifact upload to use bazel #314

Merged

4 tasks

coderabbitai bot mentioned this pull request Feb 11, 2025

Tail the dataproc job logs whenever we submit a dataproc job in run.py #359

Merged

4 tasks

This was referenced Mar 6, 2025

Migrate existing run quickstart scripts to dev customer id #473

Merged

stub/part_1: physical graph for workflow submission + column lineage #493

Merged

This was referenced Mar 15, 2025

feat: zipline init to create project scaffolding #512

Merged

compiler cutover #507

Closed

coderabbitai bot mentioned this pull request Apr 1, 2025

feat: align wheel and jar versions #559

Merged

4 tasks

This was referenced Apr 10, 2025

fix: compiler fixes #619

Merged

Refactor run.py to not hardcode against zipline-artifacts or zipline-warehouse buckets but configurable #658

Closed

This was referenced Apr 17, 2025

feat: improvements to zipline cli #644

Merged

ck followups #667

Merged

coderabbitai bot mentioned this pull request Jun 6, 2025

remove legacy compile #841

Merged

4 tasks

Build zipline-ai with Commands "zipline compile" and "zipline run" #161

Build zipline-ai with Commands "zipline compile" and "zipline run" #161

Uh oh!

Conversation

chewy-zlai commented Dec 23, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly Related PRs

Suggested Reviewers

Poem

Review ran into problems

Finishing Touches

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

chewy-zlai commented Dec 23, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 23, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)